一. 概述
kubernetes通过statefulset为zookeeper、etcd等这类有状态的应用程序提供完善支持,statefulset具备以下特性:
- 为pod提供稳定的唯一的网络标识
- 稳定值持久化存储:通过pv/pvc来实现
- 启动和停止pod保证有序:优雅的部署和伸缩性
本文阐述了如何在k8s集群上部署zookeeper和etcd有状态服务,并结合ceph实现数据持久化。
二. 总结
- 使用k8s的statefulset、storageclass、pv、pvc和ceph的rbd,能够很好的支持zookeeper、etcd这样的有状态服务部署到kubernetes集群上。
- k8s不会主动删除已经创建的pv、pvc对象,防止出现误删。
如果用户确定删除pv、pvc对象,同时还需要手动删除ceph段的rbd镜像。
- 遇到的坑
storageclass中引用的ceph客户端用户,必须要有mon rw,rbd rwx
权限。如果没有mon write
权限,会导致释放rbd锁失败,无法将rbd镜像挂载到其他的k8s worker节点。
- zookeeper使用探针检查zookeeper节点的健康状态,如果节点不健康,k8s将删除pod,并自动重建该pod,达到自动重启zookeeper节点的目的。
因zookeeper 3.4版本的集群配置,是通过静态加载文件zoo.cfg来实现的,所以当zookeeper节点pod ip变动后,需要重启zookeeper集群中的所有节点。
- etcd部署方式有待优化
本次试验中使用静态方式部署etcd集群,如果etcd节点变迁时,需要执行etcdctl member remove/add
等命令手动配置etcd集群,严重限制了etcd集群自动故障恢复、扩容缩容的能力。因此,需要考虑对部署方式优化,改为使用DNS或者etcd descovery的动态方式部署etcd,才能让etcd更好的运行在k8s上。
三. zookeeper集群部署
1. 下载镜像
docker pull gcr.mirrors.ustc.edu.cn/google_containers/kubernetes-zookeeper:1.0-3.4.10
docker tag gcr.mirrors.ustc.edu.cn/google_containers/kubernetes-zookeeper:1.0-3.4.10 172.16.18.100:5000/gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10
docker push 172.16.18.100:5000/gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10
2. 定义ceph secret
cat << EOF | kubectl create -f -
apiVersion: v1
data:
key: QVFBYy9ndGFRUno4QlJBQXMxTjR3WnlqN29PK3VrMzI1a05aZ3c9PQo=
kind: Secret
metadata:
creationTimestamp: 2017-11-20T10:29:05Z
name: ceph-secret
namespace: default
resourceVersion: "2954730"
selfLink: /api/v1/namespaces/default/secrets/ceph-secret
uid: a288ff74-cddd-11e7-81cc-000c29f99475
type: kubernetes.io/rbd
EOF
3. 定义storageclass rbd存储
cat << EOF | kubectl create -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph
parameters:
adminId: admin
adminSecretName: ceph-secret
adminSecretNamespace: default
fsType: ext4
imageFormat: "2"
imagefeatures: layering
monitors: 172.16.13.223
pool: k8s
userId: admin
userSecretName: ceph-secret
provisioner: kubernetes.io/rbd
reclaimPolicy: Delete
EOF
4. 创建zookeeper集群
使用rbd存储zookeeper节点数据
cat << EOF | kubectl create -f -
---
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
ports:
- port: 2181
name: client
selector:
app: zk
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "172.16.18.100:5000/gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.beta.kubernetes.io/storage-class: ceph
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
EOF
查看创建结果
[root@172 zookeeper]# kubectl get no
NAME STATUS ROLES AGE VERSION
172.16.20.10 Ready <none> 50m v1.8.2
172.16.20.11 Ready <none> 2h v1.8.2
172.16.20.12 Ready <none> 1h v1.8.2
[root@172 zookeeper]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE
zk-0 1/1 Running 0 8m 192.168.5.162 172.16.20.10
zk-1 1/1 Running 0 1h 192.168.2.146 172.16.20.11
[root@172 zookeeper]# kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv/pvc-226cb8f0-d322-11e7-9581-000c29f99475 1Gi RWO Delete Bound default/datadir-zk-0 ceph 1h
pv/pvc-22703ece-d322-11e7-9581-000c29f99475 1Gi RWO Delete Bound default/datadir-zk-1 ceph 1h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc/datadir-zk-0 Bound pvc-226cb8f0-d322-11e7-9581-000c29f99475 1Gi RWO ceph 1h
pvc/datadir-zk-1 Bound pvc-22703ece-d322-11e7-9581-000c29f99475 1Gi RWO ceph 1h
zk-0 pod
的rbd
的锁信息为
[root@ceph1 ceph]# rbd lock list kubernetes-dynamic-pvc-227b45e5-d322-11e7-90ab-000c29f99475 -p k8s --user admin
There is 1 exclusive lock on this image.
Locker ID Address
client.24146 kubelet_lock_magic_172.16.20.10 172.16.20.10:0/1606152350
5. 测试pod迁移
尝试将172.16.20.10
节点设置为污点,让zk-0 pod自动迁移到172.16.20.12
kubectl cordon 172.16.20.10
[root@172 zookeeper]# kubectl get no
NAME STATUS ROLES AGE VERSION
172.16.20.10 Ready,SchedulingDisabled <none> 58m v1.8.2
172.16.20.11 Ready <none> 2h v1.8.2
172.16.20.12 Ready <none> 1h v1.8.2
kubectl delete po zk-0
观察zk-0的迁移过程
[root@172 zookeeper]# kubectl get po -owide -w
NAME READY STATUS RESTARTS AGE IP NODE
zk-0 1/1 Running 0 14m 192.168.5.162 172.16.20.10
zk-1 1/1 Running 0 1h 192.168.2.146 172.16.20.11
zk-0 1/1 Terminating 0 16m 192.168.5.162 172.16.20.10
zk-0 0/1 Terminating 0 16m <none> 172.16.20.10
zk-0 0/1 Terminating 0 16m <none> 172.16.20.10
zk-0 0/1 Terminating 0 16m <none> 172.16.20.10
zk-0 0/1 Terminating 0 16m <none> 172.16.20.10
zk-0 0/1 Terminating 0 16m <none> 172.16.20.10
zk-0 0/1 Pending 0 0s <none> <none>
zk-0 0/1 Pending 0 0s <none> 172.16.20.12
zk-0 0/1 ContainerCreating 0 0s <none> 172.16.20.12
zk-0 0/1 Running 0 3s 192.168.3.4 172.16.20.12
此时zk-0正常迁移到172.16.20.12
再查看rbd的锁定信息
[root@ceph1 ceph]# rbd lock list kubernetes-dynamic-pvc-227b45e5-d322-11e7-90ab-000c29f99475 -p k8s --user admin
There is 1 exclusive lock on this image.
Locker ID Address
client.24146 kubelet_lock_magic_172.16.20.10 172.16.20.10:0/1606152350
[root@ceph1 ceph]# rbd lock list kubernetes-dynamic-pvc-227b45e5-d322-11e7-90ab-000c29f99475 -p k8s --user admin
There is 1 exclusive lock on this image.
Locker ID Address
client.24154 kubelet_lock_magic_172.16.20.12 172.16.20.12:0/3715989358
之前在另外一个ceph集群测试这个zk pod迁移的时候,总是报错无法释放lock,经分析应该是使用的ceph账号没有相应的权限,所以导致释放lock失败。记录的报错信息如下:
Nov 27 10:45:55 172 kubelet: W1127 10:45:55.551768 11556 rbd_util.go:471] rbd: no watchers on kubernetes-dynamic-pvc-f35a411e-d317-11e7-90ab-000c29f99475
Nov 27 10:45:55 172 kubelet: I1127 10:45:55.694126 11556 rbd_util.go:181] remove orphaned locker kubelet_lock_magic_172.16.20.12 from client client.171490: err exit status 13, output: 2017-11-27 10:45:55.570483 7fbdbe922d40 -1 did not load config file, using default settings.
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600816 7fbdbe922d40 -1 Errors while parsing config file!
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600824 7fbdbe922d40 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600825 7fbdbe922d40 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600825 7fbdbe922d40 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602492 7fbdbe922d40 -1 Errors while parsing config file!
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602494 7fbdbe922d40 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602495 7fbdbe922d40 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602496 7fbdbe922d40 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.651594 7fbdbe922d40 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.k8s.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
Nov 27 10:45:55 172 kubelet: rbd: releasing lock failed: (13) Permission denied
Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.682470 7fbdbe922d40 -1 librbd: unable to blacklist client: (13) Permission denied
k8s rbd volume的实现代码:
if lock {
// check if lock is already held for this host by matching lock_id and rbd lock id
if strings.Contains(output, lock_id) {
// this host already holds the lock, exit
glog.V(1).Infof("rbd: lock already held for %s", lock_id)
return nil
}
// clean up orphaned lock if no watcher on the image
used, statusErr := util.rbdStatus(&b)
if statusErr == nil && !used {
re := regexp.MustCompile("client.* " + kubeLockMagic + ".*")
locks := re.FindAllStringSubmatch(output, -1)
for _, v := range locks {
if len(v) > 0 {
lockInfo := strings.Split(v[0], " ")
if len(lockInfo) > 2 {
args := []string{"lock", "remove", b.Image, lockInfo[1], lockInfo[0], "--pool", b.Pool, "--id", b.Id, "-m", mon}
args = append(args, secret_opt...)
cmd, err = b.exec.Run("rbd", args...)
# 执行rbd lock remove命令时返回了错误信息
glog.Infof("remove orphaned locker %s from client %s: err %v, output: %s", lockInfo[1], lockInfo[0], err, string(cmd))
}
}
}
}
// hold a lock: rbd lock add
args := []string{"lock", "add", b.Image, lock_id, "--pool", b.Pool, "--id", b.Id, "-m", mon}
args = append(args, secret_opt...)
cmd, err = b.exec.Run("rbd", args...)
}
可以看到,rbd lock remove
操作被拒绝了,原因是没有权限rbd: releasing lock failed: (13) Permission denied
。
6. 测试扩容
zookeeper集群节点数从2个扩为3个。
集群节点数为2时,zoo.cfg的配置中定义了两个实例
zookeeper@zk-0:/opt/zookeeper/conf$ cat zoo.cfg
#This file was autogenerated DO NOT EDIT
clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/data/log
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
autopurge.snapRetainCount=3
autopurge.purgeInteval=12
server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888
server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888
使用kubectl edit statefulset zk
命令修改replicas=3,start-zookeeper --servers=3
,
此时观察pod的变化
[root@172 zookeeper]# kubectl get po -owide -w
NAME READY STATUS RESTARTS AGE IP NODE
zk-0 1/1 Running 0 1h 192.168.5.170 172.16.20.10
zk-1 1/1 Running 0 1h 192.168.3.12 172.16.20.12
zk-2 0/1 Pending 0 0s <none> <none>
zk-2 0/1 Pending 0 0s <none> 172.16.20.11
zk-2 0/1 ContainerCreating 0 0s <none> 172.16.20.11
zk-2 0/1 Running 0 1s 192.168.2.154 172.16.20.11
zk-2 1/1 Running 0 11s 192.168.2.154 172.16.20.11
zk-1 1/1 Terminating 0 1h 192.168.3.12 172.16.20.12
zk-1 0/1 Terminating 0 1h <none> 172.16.20.12
zk-1 0/1 Terminating 0 1h <none> 172.16.20.12
zk-1 0/1 Terminating 0 1h <none> 172.16.20.12
zk-1 0/1 Terminating 0 1h <none> 172.16.20.12
zk-1 0/1 Pending 0 0s <none> <none>
zk-1 0/1 Pending 0 0s <none> 172.16.20.12
zk-1 0/1 ContainerCreating 0 0s <none> 172.16.20.12
zk-1 0/1 Running 0 2s 192.168.3.13 172.16.20.12
zk-1 1/1 Running 0 20s 192.168.3.13 172.16.20.12
zk-0 1/1 Terminating 0 1h 192.168.5.170 172.16.20.10
zk-0 0/1 Terminating 0 1h <none> 172.16.20.10
zk-0 0/1 Terminating 0 1h <none> 172.16.20.10
zk-0 0/1 Terminating 0 1h <none> 172.16.20.10
zk-0 0/1 Terminating 0 1h <none> 172.16.20.10
zk-0 0/1 Pending 0 0s <none> <none>
zk-0 0/1 Pending 0 0s <none> 172.16.20.10
zk-0 0/1 ContainerCreating 0 0s <none> 172.16.20.10
zk-0 0/1 Running 0 2s 192.168.5.171 172.16.20.10
zk-0 1/1 Running 0 12s 192.168.5.171 172.16.20.10
可以看到zk-0/zk-1都重启了,这样可以加载新的zoo.cfg配置文件,保证集群正确配置。
新的zoo.cfg配置文件记录了3个实例:
[root@172 ~]# kubectl exec zk-0 -- cat /opt/zookeeper/conf/zoo.cfg
#This file was autogenerated DO NOT EDIT
clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/data/log
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
autopurge.snapRetainCount=3
autopurge.purgeInteval=12
server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888
server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888
server.3=zk-2.zk-hs.default.svc.cluster.local:2888:3888
7. 测试缩容
缩容的时候,zk集群也自动重启了所有的zk节点,缩容过程如下:
[root@172 ~]# kubectl get po -owide -w
NAME READY STATUS RESTARTS AGE IP NODE
zk-0 1/1 Running 0 5m 192.168.5.171 172.16.20.10
zk-1 1/1 Running 0 6m 192.168.3.13 172.16.20.12
zk-2 1/1 Running 0 7m 192.168.2.154 172.16.20.11
zk-2 1/1 Terminating 0 7m 192.168.2.154 172.16.20.11
zk-1 1/1 Terminating 0 7m 192.168.3.13 172.16.20.12
zk-2 0/1 Terminating 0 8m <none> 172.16.20.11
zk-1 0/1 Terminating 0 7m <none> 172.16.20.12
zk-2 0/1 Terminating 0 8m <none> 172.16.20.11
zk-1 0/1 Terminating 0 7m <none> 172.16.20.12
zk-1 0/1 Terminating 0 7m <none> 172.16.20.12
zk-1 0/1 Terminating 0 7m <none> 172.16.20.12
zk-1 0/1 Pending 0 0s <none> <none>
zk-1 0/1 Pending 0 0s <none> 172.16.20.12
zk-1 0/1 ContainerCreating 0 0s <none> 172.16.20.12
zk-1 0/1 Running 0 2s 192.168.3.14 172.16.20.12
zk-2 0/1 Terminating 0 8m <none> 172.16.20.11
zk-2 0/1 Terminating 0 8m <none> 172.16.20.11
zk-1 1/1 Running 0 19s 192.168.3.14 172.16.20.12
zk-0 1/1 Terminating 0 7m 192.168.5.171 172.16.20.10
zk-0 0/1 Terminating 0 7m <none> 172.16.20.10
zk-0 0/1 Terminating 0 7m <none> 172.16.20.10
zk-0 0/1 Terminating 0 7m <none> 172.16.20.10
zk-0 0/1 Pending 0 0s <none> <none>
zk-0 0/1 Pending 0 0s <none> 172.16.20.10
zk-0 0/1 ContainerCreating 0 0s <none> 172.16.20.10
zk-0 0/1 Running 0 3s 192.168.5.172 172.16.20.10
zk-0 1/1 Running 0 13s 192.168.5.172 172.16.20.10
四. etcd集群部署
1. 创建etcd集群
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Service
metadata:
name: "etcd"
annotations:
# Create endpoints also if the related pod isn't ready
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
ports:
- port: 2379
name: client
- port: 2380
name: peer
clusterIP: None
selector:
component: "etcd"
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: "etcd"
labels:
component: "etcd"
spec:
serviceName: "etcd"
# changing replicas value will require a manual etcdctl member remove/add
# command (remove before decreasing and add after increasing)
replicas: 3
template:
metadata:
name: "etcd"
labels:
component: "etcd"
spec:
containers:
- name: "etcd"
image: "172.16.18.100:5000/quay.io/coreos/etcd:v3.2.3"
ports:
- containerPort: 2379
name: client
- containerPort: 2380
name: peer
env:
- name: CLUSTER_SIZE
value: "3"
- name: SET_NAME
value: "etcd"
volumeMounts:
- name: data
mountPath: /var/run/etcd
command:
- "/bin/sh"
- "-ecx"
- |
IP=$(hostname -i)
for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do
while true; do
echo "Waiting for ${SET_NAME}-${i}.${SET_NAME} to come up"
ping -W 1 -c 1 ${SET_NAME}-${i}.${SET_NAME}.default.svc.cluster.local > /dev/null && break
sleep 1s
done
done
PEERS=""
for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do
PEERS="${PEERS}${PEERS:+,}${SET_NAME}-${i}=http://${SET_NAME}-${i}.${SET_NAME}.default.svc.cluster.local:2380"
done
# start etcd. If cluster is already initialized the `--initial-*` options will be ignored.
exec etcd --name ${HOSTNAME} \
--listen-peer-urls http://${IP}:2380 \
--listen-client-urls http://${IP}:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://${HOSTNAME}.${SET_NAME}:2379 \
--initial-advertise-peer-urls http://${HOSTNAME}.${SET_NAME}:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster ${PEERS} \
--initial-cluster-state new \
--data-dir /var/run/etcd/default.etcd
## We are using dynamic pv provisioning using the "standard" storage class so
## this resource can be directly deployed without changes to minikube (since
## minikube defines this class for its minikube hostpath provisioner). In
## production define your own way to use pv claims.
volumeClaimTemplates:
- metadata:
name: data
annotations:
volume.beta.kubernetes.io/storage-class: ceph
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
EOF
创建完成之后的po,pv,pvc清单如下:
[root@172 etcd]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE
etcd-0 1/1 Running 0 15m 192.168.5.174 172.16.20.10
etcd-1 1/1 Running 0 15m 192.168.3.16 172.16.20.12
etcd-2 1/1 Running 0 5s 192.168.5.176 172.16.20.10
2. 测试缩容
kubectl scale statefulset etcd --replicas=2
[root@172 ~]# kubectl get po -owide -w
NAME READY STATUS RESTARTS AGE IP NODE
etcd-0 1/1 Running 0 17m 192.168.5.174 172.16.20.10
etcd-1 1/1 Running 0 17m 192.168.3.16 172.16.20.12
etcd-2 1/1 Running 0 1m 192.168.5.176 172.16.20.10
etcd-2 1/1 Terminating 0 1m 192.168.5.176 172.16.20.10
etcd-2 0/1 Terminating 0 1m <none> 172.16.20.10
检查集群健康
kubectl exec etcd-0 -- etcdctl cluster-health
failed to check the health of member 42c8b94265b9b79a on http://etcd-2.etcd:2379: Get http://etcd-2.etcd:2379/health: dial tcp: lookup etcd-2.etcd on 10.96.0.10:53: no such host
member 42c8b94265b9b79a is unreachable: [http://etcd-2.etcd:2379] are all unreachable
member 9869f0647883a00d is healthy: got healthy result from http://etcd-1.etcd:2379
member c799a6ef06bc8c14 is healthy: got healthy result from http://etcd-0.etcd:2379
cluster is healthy
发现缩容后,etcd-2并没有从etcd集群中自动删除,可见这个etcd镜像对自动扩容缩容的支持并不够好。
我们手工删除掉etcd-2
[root@172 etcd]# kubectl exec etcd-0 -- etcdctl member remove 42c8b94265b9b79a
Removed member 42c8b94265b9b79a from cluster
[root@172 etcd]# kubectl exec etcd-0 -- etcdctl cluster-health
member 9869f0647883a00d is healthy: got healthy result from http://etcd-1.etcd:2379
member c799a6ef06bc8c14 is healthy: got healthy result from http://etcd-0.etcd:2379
cluster is healthy
3. 测试扩容
从etcd.yaml的启动脚本中可以看出,扩容时新启动一个etcd pod时参数--initial-cluster-state new
,该etcd镜像并不支持动态扩容,可以考虑使用基于dns动态部署etcd集群的方式
来修改启动脚本,这样才能支持etcd cluster动态扩容。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。